On the Growth Rate of Non-Enzymatic Molecular Replicators 
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It is well known that non-enzymatic template directed molecular replicators X + nO — > 2X 
exhibit parabolic growth d[X]/dt oc fc[X]^/'^. Here, we analyze the dependence of the effective 
replication rate constant k on hybridization energies, temperature, strand length, and sequence 
composition. First we derive analytical criteria for the replication rate k based on simple ther- 
modynamic arguments. Second we present a Brownian dynamics model for oligonucleotides that 
allows us to simulate their diffusion and hybridization behavior. The simulation is used to generate 
and analyze the effect of strand length, temperature, and to some extent sequence composition, on 
the hybridization rates and the resulting optimal overall rate constant fc. Combining the two ap- 
proaches allows us to semi-analytically depict a fitness landscape for template directed replicators. 
The results indicate a clear replication advantage for longer strands at low temperatures. 

Keywords: non-enzymatic molecular replication; growth rate; product inhibition; reaction kinetics; Brownian 
dynamics 



I. INTRODUCTION 

Optimizing the yield of non-enzymatically self- 
replicating biopolymers is of great interest for many ba- 
sic science and application areas. Clearly, the early or- 
ganisms could not emerge with a fully developed enzy- 
matic gene replication machinery, so it is plausible that 
the first organisms had to rely on non-enzymatic repli- 
cation Most bottom up protocell models also rely 
on non-enzymatic biopolymer replication 04^, which is 
also true for a variety of prospective molecular comput- 
ing and manufacturing applications Common for all 
of these research areas is the interest to obtain an opti- 
mal replication yield in the absence of modern enzymes. 
Depending on the details the biopolymer can be deoxyri- 
bonucleic acid (DNA), ribonucleic acid (RNA), peptide 
nucleic acid (PNA), etc. In the following we'll refer to 
them as XNA. 

Conceptually, XNA replication proceeds in three basic 
steps: (a) association, or hybridization of n nucleotide 
monomers or oligomers with a single stranded, comple- 
mentary template; (b) formation of covalent bonds in a 
condensation reaction, called polymerization in case of 
monomer condensation and ligation in case of oligomers; 
and finally (c) dissociation, or dehyhridization of the 
newly formed complementary strand; 




FIG. 1. Minimal template directed replicator: two com- 
plementary oligomers hybridize to a template strand (up- 
per part). An irreversible ligation reaction transforms the 
oligomers into the complementary copy of the template. The 
newly obtained double strand can dehybridize (lower part) 
thus allowing for iteration of the process. We assume that 
ligation is rate limiting, which implies that hybridization and 
dehyhridization are in local equilibrium. 



X+nO^ XOn^ XX+{n-l)W ^ X+X+{n-l) 



(a) 



(b) 



(c) 



j^-j ing group of the condensation reaction, and 



(2) 



(1) 

Here, X and X denote the template strand and its com- 
plement, O denotes monomers/oligomers, W is the leav- 
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are the equilibrium constants of the three reactions. Note 
that the left hand transition of reaction scheme ^ is an 
abbreviation of a multi-step process that accounts for 
all n individual oligomer hybridizations and dehybridiza- 
tions, which is only partly captured by the net reaction 
process. 

The covalent condensation reaction is entirely activa- 
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tion limited. For nucleotide monophosphates, the leaving 
group corresponds to water which (due to its high con- 
centration in aqueous solution) pushes the equilibrium 
to the hydrolyzed state. Product yields are significantly 
increased when using activated nucleotides, such as nu- 
cleotide triphosphate or imidazole. 

Due to its complex reaction mechanism, non- 
enzymatic XNA replication from monomers suffers 
from various complications, namely extension of se- 
quences containing consecutive adenine and thymine nu- 
cleotides 10,, llj, as well as side reactions such as partial 
replication and random strand elongation p^ . Conse- 
quently, experiments with activated nucleotides typically 
show little yield in aqueous solution, although results can 
be enhanced by employing surfaces (e.g. clay) or up- 
concentration in water-ice [2, [3] • 

Replication from short activated oligomers, on the 
other hand, does produce high yields for both RNA and 
DNA [IJtIM and references therein]. In particular, this 
observation has lead to the development of minimal repli- 
cator systems, in which ligation of two oligomers is suf- 
ficient to form the complementary replica (see Fig. [T}. 
One of the reasons why these systems outperform repli- 
cators that draw from monomers is that the above side 
reactions expectedly occur, if at all, only to a negligible 
extent. 

Neglecting both the production of waste as well as the 
hydrolysis of the ligation product, but explicitly tak- 
ing into account the individual oligomer associations, 
minimal replicator systems (here for the case of a self- 
complementary template) can be written as 



X + 20^ XO + ^ XO2 — 



X2 ^ 2X. 



(3) 



In this article, we first develop a theoretical expres- 
sion for the template directed replication rate of mini- 
mal replicator systems as a function of strand length and 
temperature. This analytical model provides transparent 
physical relations for how temperature, strand length- 
and composition impact the overall replication rate. We 
then present a 3D, implicit solvent, constrained Brownian 
Dynamics model for short nucleotide strands, i.e. strands 
with negligible secondary structures. The model does not 
attempt to be (quantitatively) predictive. In particular, 
we do not attempt to calibrate interaction parameters 
to experimental data, which prevents any sequence pre- 
diction. On the contrary, it is our aim to demonstrate 
that much of the replication properties of oligonucleotides 
arises from rather general statistical physics. The simu- 
lation is used to measure diffusion coefficients, effective 
reaction radii, and hybridization rates and their depen- 
dence on temperature, strand length, and, to some ex- 
tent, sequence information. This allows us to qualita- 
tively obtain equilibrium constants Kq and and to 
qualitatively sketch the effective replication rate fc as a 
function of strand length and temperature. Our analysis 
focuses on minimal replicator systems in the context of 



chemical replication experiments as employed in protocell 
research and manufacturing applications, where the re- 
searcher controls reactant concentrations as well as most 
experimental parameters. However, we also discuss the 
impact of our findings in the context of origin of life re- 
search, were possible side reactions cannot be neglected. 



II. PARABOLIC GROWTH AND 
REPLICATION RATE 

Following and extending the derivation of Refs. [l^, 
[20| . we assume that ligation is the rate limiting step. 
This translates into the following conditions for the rate 
constants: 

kL [XO2] < fc+ [X] [O] ki^ [XO2] < k+ [XO] [O] 

kL [XO2] < fco [XO] ki^ [XO2] « k^ [XO2] (4) 

ki^[X02]<^k+[XY ki^[X02]^k^[X2] 

One can then assume a steady state of the hybridiza- 
tion/dehybridization reactions and express the total tem- 
plate concentration [X] total = [X] + [XO] + [XO2] + 2[X2] 
in terms of equilibrium constants as 

[X]totai - (1 + Ko[0] + KlpY) [X] + 2Kt[X]\ (5) 

When solved for [X], this gives 



[X] = 



8 At [a:; 



total 



{l + Ko[0]+Kl[OYY 



l + Ko[0]+Kl[OY 



(6) 



Template directed replication typically suffers from 
product inhibition, where most templates are in dou- 
ble strand configuration, i.e. ATxiAJtotai ^ 1- Over the 
course of the reaction, this is tantamount of saying that 
^8i^T[A]totai ^l + Ko[0] + Kl [O] 2 . This allows us to 
approximate 



8ifT[A]totai + (1 + Ko[0] + KlPYY 



= v/8ArT[A]total 

and simplify ([6]) to 



{l + Ko[0]+Kl[OYY 

+ 0([A]total) (7) 



[X] 



[X] 



total 



2K^ 



O(mtotal). 



(8) 



This is a lower bound of the single strand concentration, 
which is approached in the limit of vanishing oligomer 
concentration. By combining ^ and ([5]), we get 



d[X] 



total 



dt 



ki^[X02]^ki^Kt,[0'][X] 



k [0]V[^]total 



(9) 
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with 



(10) 



This weU-estabhshed parabohc growth law is known to 
quahtatively aher evolutionary dynamics of XNA based 
minimal replicators and to promote coexistence of repli- 
cators rather than selection of the fittest [UlllH Con- 
sequently, several strategies have been designed to over- 
come product inhibition in order to reestablish Darwinian 
evolution and survival of only the fittest [Tsl . [20l [23l . [23 | . 
Most of these approaches hinge on a mechanism to lower 
the hybridization tendency of the product to the tem- 
plate. In this article, however, we accept parabolic 
growth and instead focus on the effective growth rate. 

The key observation of equation (fTO|) is that, due to 
the steady state assumption, the overall growth rate is 
independent of the individual association and dissocia- 
tion rates k^,k~, but only depends on the equilibrium 
constants Kq and Kt- Expressed in free energy changes, 
Eq. PU)) becomes 



k = f^[^ogA+{^AGT-2AGo-AGl)]/ksT 



(11) 



where A and AGl are the pre-exponential factor and 
activation energy of the ligation reaction, respectively, 
and we have used the Arrhenius equation 



Ae 



-AGl/k-eT 



(12) 



We further observe that any potential optimum of (fTU)) 
must obey 

2ki^'KTKo = ^ki^KoKr' - Aki^K^K'^ (13) 

where the prime indicates derivation with respect to any 
variable. Note that derivatives of ki^^K^, and Kq can 
be taken with respect to parameters such as temperature 
and template length. In sequence space, however, we 
do not have an ordering that would allow us to perform 
derivatives. Therefore, equation ([T^ can only give us 
partial information about an optimal growth rate. 

It is well-known that the equilibrium constants Kq and 
i^T depend on various parameters such as temperature, 



salt concentration, strand length, and sequence informa- 
tion - all being relevant control parameters when design- 
ing replication experiments or delimiting origin of life 
conditions j2^, [2^. Furthermore, the two rates are in- 
terdependent as one expects -ftTx to rise with increasing 
Ko. 

Qualitatively, the free energy of XNA hybridization 
obeys a form given by 



AG'(7V,T) 



^AGbase 
A^(AFbasc 
+ AiJinit - 



- AGinit 
rA^baso 
TA^init, 



(14) 



where N signifies the strand length, AGbaso is the 
(maximal) energy change per base, AGinit is the initi- 
ation energy and AiJbaso, AS'base are negative, whereas 
AiJinit , AS'init are positive. The right hand side of the 
equation expresses a saturation in the free energy per 
base as a function of the strand length; the free energy 
gain for each base pairing asymptotically becomes con- 
stant for long strands [23|. Inserting ([T^ into (|lip and 
separating out the rate constant for the ligation reaction 
/cl, we obtain: 



^AG{N,T}-2AG{N/2,T) | /fceT 



OC e 



1 3 

-A'AGba=c + -AGi„it I/ZcbT 

2 2 J (15) 



which, when differentiated for T, yields a positive depen- 
dence on temperature, iff 



> 



iV < -- 



3AHu 



Since Ai?init > 0, and AiJbasc < 0, this critical strand 
length is truly positive. It might surprise that Kq/ y/2K^ 
can increase with decreasing temperature - the regime 
where templates are primarily inhibited by the prod- 
uct. The results become understandable when consid- 
ering that oligomers, with their lower hybridization rate, 
barely associate with the template if the temperature is 
raised. 

Reintroducing the ligation reaction, this relation gets 
refined to 



^ In particular, it has been shown that under parabohc growth 
conditions, competing replicators Xi grow when sufficiently rare: 



dt 



[X,] > 0. 



The equation captures the connection between the growth rate 
ki and its selective pressure, such that replicator species with a 
high growth rate are also assigned a high evolutionary fitness. 
See Ref. [21II for the derivation. It is this relation that allows us 
to speak of a fitness landscape when referring to the functional 
shape of k. 



'gA-fiA'AGba.c + ^AGinit+AGi I //cbT 



with the critical strand length 



dk 
dT 



> 



N <N* = -- 



3AHi 



init 



2Ag^ 



(16) 



(17) 



In words: we can identify a critical strand length N* 
above which the overall replication rate k increases with 
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FIG. 2. Effective replication rate k (given by equation I16p 
as a function of strand length and temperature. For strands 
below a critical length A'^* (here 10) the rate increases with 
temperature, for strands longer than A'"* , the replication rate 
grows with decreasing temperature. The value of A'^* is de- 
termined through equation (|17|l . Note the saddle point of 
the surface where T* and A''* intersect (Eq. I13|l (A/Zbasc = 

-l.bksT', AS-basc = -IfcB, AHinit = O.SO/CbT', A^init = 

1.25fcB, A_ff*=5.25fcBT', A = 10^) 



decreasing temperature. This critical strand length is de- 
termined by the hybridization enthalpies ATJbasc, ^Hinit, 
and activation enthalpy change AiJ^ of ligation.. 

Fig. [2] depicts the graph of the replication rate land- 
scape (ITSl) that clearly identifies the optimum of equa- 
tion p3p as a saddle point. The corresponding tempera- 
ture T* where k changes its scaling with respect to strand 
length is - independent of the ligation reaction - given 
by 



dk ^ 



T <T* ^ 



base 



(18) 



Can we obtain a higher replication rate by using non- 
symmetric oligomers? The rational behind this strat- 
egy is to increase the binding affinity of one oligomer to 
maybe decrease product inhibition. A simple refinement 
of equation allows us to capture this approach with 
our model: 

Kq^Kq^ nAGiN,T)-AG{Ni,T)-AGiN2.T))/kBT 



where Ni + N2 — N denote the lengths of oligomer 
strands Oi and 02- Thus, according to our simple 
thermodynamic considerations, non-symmetric variants 
of the replication process do not show more yield than 
the corresponding symmetric system: the binding affinity 
gained for the long oligomer strand is paid to hybridize 
the short oligomer strand. 

Fig. [2] seemingly implies that replication rates grow be- 
yond any limit for long templates, which is unphysical. 



To resolve this inconsistency, it is important to remem- 
ber that our findings are only valid in the regime where 
ligation is rate limiting. For very long XNA strands, how- 
ever, double strands are so stable that dehybridization of 
the ligation product is expected to become the rate lim- 
iting step. Independent of the exact shape of the growth 
law, the dominant factor of the effective growth rate is 
given by 



,+AGi„it)/feBT 



(20) 



where fc^ summarizes both pathways of either product 
rehybridization or hybridization of oligomers followed by 
ligation. As fc+ is composed of hybridization (i.e. diffu- 
sion plus orientational alignment) and ligation events, it 
varies only slightly with sequence length when compared 
to dehybridization rates for the case of large N. There- 
fore, the effective replication rate will be governed by the 
scaling 



k (X e 



(21) 



with the limit 



lim fc = 0, 



since AGbasc < 0. As a consequence, we expect a full 
non-equilibrium study of the replication process to show 
a proper maximum in the replication rate as a function 
of strand length. 



III. SPATIALLY RESOLVED REPLICATOR 
MODEL 



Spatially resolved template-directed replicators have 
been previously simulated in the Artificial Life commu- 
nity using two-dimensional cellular automata and contin- 
uous virtual physics [13, [H, . The model we present 
here is conceptually similar to, but simpler than other 
coarse-grained DNA models [e.g.. I30l - l32j |. Compared to 
our earlier work on hybridization and ligation [33| . the 
model presented here is less computationally expensive 
while simultaneously being broader in its range of appli- 
cation. 

We model nucleic acid strands as chains of hard spheres 
that are connected by rigid bonds. Each sphere has mass 



radius r, position and velocity (x^, v^) e 



p3x3 



as well 

as moment of inertia 6, orientation and angular momen- 



tum (ci;i,Li) e 



representing the spatial orienta- 



tion of the respective nucleotide. Further, each sphere 
has a type ti £ {A, T, C, G}, and we define A and T (C 
and G) to be complementary. The model is implicit in 
the sense that solvent molecules are not represented ex- 
plicitly, but only through their effect on the nucleotide 
strands. We model the (translational and rotational) mo- 
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tion of each sphere by a Langevin equation 
a 

L, = -VJ7,(x,u;)-7^+ e.- 



(22a) 
(22b) 

(22c) 
(22d) 



Here, 7 is the friction coefficient, and ^.^ are zero mean 
random variables accounting for thermal fluctuations. 
Together, friction and thermal noise act as a thermo- 
stat: they equilibrate the kinetic energy with an external 
heat bath whose temperature iseiven by the following 
fluctuation-dissipation-theorem |3J|: 

{Ut):ij{t')) = 27fcBT 5,,5{t - t') (23a) 

UtYUt')) = 2jfcBT 6,j6{t - t'). (23b) 



Hence, a temperature change directly translates into a 
change of the Brownian noise amplitude. We use the 
moment of inertia for solid spheres 9 = '^mr^ - noting 
that one could, in principle, use moment of inertia tensors 
to reflect the geometry of the individual nucleobases. 

Equations (I22ap - (j22dp are solved under the con- 
straints 



|xi - Xj l = rbond if i,j bonded 



= 2r if 

= 1 



< 2r 



(24a) 
(24b) 
(24c) 



to account for rigid bonds (I24ap and hard spheres (|24bP . 
By setting rbond < 2r, we can assert that strands do not 
penetrate each other. We define the following angles (see 
Fig.E]): 



COS! 



COS( 



COS -^i 



Xj X^ X/i, Xj 
^bond ^bond 
- Xj 

— 

^bond 
COS UJij = [uJi ■ OJ 
X.> Xi 



i,j and i, k bonded 

i,j bonded 
i, j bonded 
i,j not bonded. 



As much of the molecular geometry is already deter- 
mined through the constraints, the innermolecular po- 
tentials U and U only need to account for strand stiff- 
ness (|25ap . base orientation (j25bp . and 7r-stacking (|25cl) . 
We set: 



T rbond 

Ui — abend 



^ortho 



/Vparallcl - 

'-^ ij — apai'allcl 



1 ) if i not terminal (25a) 
2 

(25b) 
(25c) 




FIG. 3. Geometry of the nucleotide strands. The figure shows 
the angles that define inner- and intermolecular interactions 
for one nucleobase (shaded in grey). 



The minimum energy state of these definitions are 
stretched out nucleotide strands with orientations per- 
pendicular to the strand and parallel to each other. 

In addition, we define the following intermolecular po- 
tentials between non-bonded complementary nucleobases 
i and j: 



U. 



hybrid 



'Ohybrid d ( |xj - Xj I ) cos Ipi 



U, 



hybrid _ -aj^yj^j.;^ d(|Xj - Xjl) ( ^ - 1 



(25d) 
(25e) 



if |x,; - x,-| < re 



d{rij) 



The shift and weighing function 
, - 2r 



2r 



asserts that the potentials take on a minimum at particle 
contact and level out to zero at the force cutoff radius 
Tc. Equation (|25dp allows for a nucleobase i to attract 
its complement j along the direction of uji, while (|25ep 
orients u^i toward the complement. 

Taking the above potentials together, we define 

[/.(x,a;) = C/^"d(x)+ J2 f^l7'"'(^''^) (25f) 

non-bonded 
complementary 



(x,u;)^ ^ (t>-*'^°(x,. 



parallel 



bonded 



(25g) 



non-bonded 
complementary 



Equations (|22ap to (|24cp are numerically integrated us- 
ing a Velocity Verlet algorithm that, in each iteration, 
first computes unconstrained coordinates which are af- 
terwards corrected with a Shake algorithm to satisfy 
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parameter 


value 


comment 


Eqs. 


m 


1 


particle mass 


Il22bll - (I22dll 


7 


3 


friction coefficient 


Il22bll. Il22dll 


ksTo 


1 


equilibrium temperature 


|[23all. Il23bll 


At 


0.05 


numerical time step 




r 


0.25 


particle radius 


Il24bll 


^bond 


0.45 


bond length 


Il24all 


rc 


1 


force cutoff radius 


Il25dll - Il25ell 


^bend 


5 


strand stiffness 


Il25all 


^ortho 


2.5 


angular stretching 


ll25bt 


'^parallel 


1 


angular alignment 


ll25ct 


'^hybrid 


10 


angular hybridization 


ll25et 


^hybrid 


1 


complementary attraction 


ll25dt 



TABLE I. Model parameters in reduced units (unless other- 
wise noted). 
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kBT=l 


o 






V T T 


□ 






kBT=3 


A 






D-kBT/63tTil - 








A 


A , 


12 3 4 


5 6 7 8 


9 10 



strand length N 

FIG. 4. Diffusion coefficients measured for different strand 
lengtlis and temperatures (symbols) fitted to the prediction of 
the Einstein-Stokes relation (solid lines). For each parameter 
pair, 40 simulation runs over lOOOr have been averaged. 



the constraints [ssjH Typical system configurations are 
shown in Fig. [TJ 



IV. SIMULATION RESULTS 

In the subsequent analyses, we will employ reduced 
units, i.e., m = 1, Tc = 1, and fcelo = 1 define the units 
of mass, length, and energy. From this, the natural unit 
of time follows as 



T = r^m/kBTo. 

The parameters r and rbond are chosen to prevent 
crossing of strands (rbond < 2r). The ratio rbond/'' de- 
termines the double strand geometry which is modeled 
more sparse than in actual nucleic acid strands in order 
to compensate for the relatively shallow potentials of the 
coarse-grained model. The ratio r/r^ determines the dis- 
tance at which complementary bases "feel each other" 
and has been set to two times the bead diameter. 

The amplitudes abend, Oortho, Oparaiiei of the potential 
functions are chosen in order to promote stacked single 
strands for temperatures up to at least S/cbTq and to 
loosely match the persistence length of DNA (three nu- 
cleotides) at unit temperature. Finally, the values for 



^ Note that our approach would not work in the absence of a ther- 
mostat; to describe rotational motion properly, one would need 
to define orientations and angular momenta in a local reference 
frame that moves with the extended object to which the ori- 
ented point particle belongs. In this manner, rotational motion 
of the extended object gets propagated down to the angular mo- 
menta of the particles it consists of (A QShake algorithm would 
in addition be needed to properly conserve angular momenta in 
the constraints). While this approach is computationally sig- 
nificantly more cumbersome, we expect the result to be similar 
for the above model, in which rotation of extended objects is 
propagated down to its constituting particles through angular 
potentials and an overdamped thermostat. 



flhybrid and flhybrid are chosen to enable hybridization at 
unit temperature. We point out that our model utilizes a 
high value Shybrid in order to promote fast hybridization. 
A list of all model parameters (unless otherwise noted) 
is given in Table HI 

Note that it is not mandatory to relate one bead of 
the model to one physical nucleotide. Instead, each bead 
could also represent a short XNA subsequence (e.g. 2-4 
nucleotides). While this would result in a closer match of 
the ratio rbond /r, the amplitudes of the potential func- 
tions would need to be adapted to reflect the changed 
representation. 



A. Diffusion 

In dilute solution, DNA diffusion depends primarily on 
temperature and strand length, as opposed to primary or 
secondary structure. In the limit of low Reynolds num- 
bers, the diffusion coefficient of a sphere is given by the 
Einstein- Stokes equation 



D 



dirrjr 



(26) 



where rj is the viscosity of the medium and r the radius 
of the sphere. 

In order to compare our model polymer diffusion to 
(|26p . we perform simulations of single homopolymers 
(e.g. poly-C) and determine the diffusion coefficient from 
its measured mean square displacement 



^_ l |x(AO-x(0)r 
6 At 



Fig. |4] shows results for strands of lengths = 1, . . . , 10 
and temperatures /cbT — 1, . . . , 3. The data sets a scaling 
relation between our model parameter N and the Stokes 
radius r which is a priori not known. For the general 
scaling relation r oc N'^ we obtain the most likely expo- 
nent from data fitting via v and rj as 1.06. By setting 
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0.25 - 



14 16 



strand length N 



= 0.60 
= 0.68 
= 0.76 



o 

A 
□ 



= 0.81 



boundaries 



1 

0.5 



0.25 ' 



self-complementary 




0.2 kgT 
(1-1)"; v = 0.58 
(1-1)" 



6 8 10 12 14 16 

strand length N 



(1-1) ;v = 0.76 
1.0 knT 



(1-1)^ 



V = 0.68 
0.5 k^T 

V = 0.70 



(1-1) ;v = 
(1-1)^; v = 



0.80 
0.81 



FIG. 5. Radius of gyration measured for different strand 
lengtlis and bending potentials (symbols) fitted to the pre- 
diction of the Flory mean field theory (solid lines). For each 
parameter pair, 40 simulation runs over 400r have been av- 
eraged. The upper panel shows results for homopolymers 
(e.g. poly-C), the lower panel compares those to radii of self- 
complementary strands. The plots also show the boundaries 
for maximally stretched chains {v = 1 - upper dotted line) 
and the expectation value of an ideal chain (v = 3/5 - lower 
dotted line). 



1^ = 1, and equivalently r cc N, we obtain the best agree- 
ment between measurement and theory (by fitting via rj 
only) for ry ~ O-OGlfceTr^/r^ (see solid lines in Fig.H]). 



B. Radius of gyration 



Again in dilute solution, the radius of gyration 

N 

N -~ 



Rl = 



1 ^ 

—y\ 



^mcan | ; 



with Xmoan being the center of gravity of the chain, is 
expected to depend on chain length and temperature (or 



equally the backbone stiffness abend)- As opposed to dif- 
fusion, we do expect the radius of gyration to change 
with the primary structure of the nucleotide strand. For 
homopolymers, we expect i?g to be well described by the 
Flory mean field model ^] 

i?g oc (TV - 1)". 

We perform simulations of single homopolymers and 
self-complementary nucleotide strands and determine the 
radius of gyration. Fig. [5] shows results for strands of 
lengths = 4, . . . , 16 and various backbone stiffness val- 
ues. It is found that the Flory model is a good pre- 
diction, not only for homopolymers, but also for self- 
complementary strands. Expectedly, the radius of gy- 
ration is smaller for self-complementary strands. For 
Obcnd — 0.2, we find the radius of gyration of self- 
complementary strands to be slightly longer than the ra- 
dius of gyration of a homopolymer with half the length 
- implying that the strand is almost always in a hairpin 
configuration. For stronger backbone stiffness values, the 
effect is reduced. 



C. Melting behavior 

We analyze the melting behaviour [A] 2 ^ 2 [A] of com- 
plementary nucleotide strands as a function of tempera- 
ture for various strand lengths and sequences. We con- 
sider a base to be hybridized if there is a complemen- 
tary base of another strand within a maximal distance of 
Tc. Denoting the fraction of hybridized nucleobases with 
< X ^ Ij can compare the melting curves to the 
theoretical prediction 



X{T)={1 



AH-TAS 



(27) 



where A_ff, AS are constants depending on template 
length, sequence, and concentration. 

Fig. ini shows melting curves for 18 different sequences 
and fits (via AHi,ASi) to the theoretical prediction, 
where each panel analyzes sequences that are subse- 
quences of a common master sequence denoted in each 
panel. The graphs clearly show how the average hy- 
bridization increases with strand length for each master 
sequence. Inlays, where present, emphasize that the in- 
verse of the melting point, at which x{Tm) = 0.5, scales 
linearly with the inverse of the strand length. 

Comparing the individual panels to each other, we find 
that melting temperatures for strands of equal length are 
higher for sequences with identical adjacent bases. In 
fact, the melting behavior is dominated by the presence 
of identical adjacent bases: adding a single nucleobase to 
a strand that consists otherwise only of identical pairs 
(i.e., moving from length 4 to 6 and from length 8 to 10 
in panel two) has no significant impact on the observed 
melting temperature. We assume that this behavior is 
due to the fact that dehybridized nucleotides of a partly 
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ACGCATACGCAT 




CTACTAGGGGGG 



1.5 

temperature kgT 



N 


= 2 


N = 6 ^ 


N = 


10 




N 


= 4 ^ 


N = 8 ^ 


N = 


12 





FIG. 6. Systems of size 10^ are initialized with two comple- 
mentary strands of length A'^. The sequence information is 
taken from the A'' central nucleotides of the master sequence 
denoted in each panel (e.g., A'' = 6 implies sequence CATACG 
in the first panel). Each system is simulated over 50000r, 
and the average fraction x of hybridized nucleobases is deter- 
mined. Error bars show the average and standard deviation of 
40 measurements. Solid lines show the theoretical prediction 

/ Aff-TAS \ -1 

X{T) = I 1 -I- e RT J fitted individually to each data 
set via AS and Aff . Melting temperatures Tm are obtained 
from the relation xiTm) ~ 0.5, and their scaling as a function 
of strand length is depicted in the inlays for the cases where 
enough melting points had been observed. 



molten strand find more potential binding partners to 
enforce the stability of the partly molten strand, thereby 
promoting recombination of products. 

We emphasize that our model is not suited to obtain 
quantitative sequence dependent melting temperatures. 
While it is well known that not only strand composi- 
tion, but the actual arrangement of bases influences the 
melting temperature [s^l , the magnitude of this effect is 
not expected to be captured quantitatively by our model. 
Nevertheless, the results confirm that sequence informa- 
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N = 


12 






temperature kgT 



FIG. 7. Melting curves for an oligomer that hybridizes to the 
left hand side of the master sequence in the presence of the 
right hand side oligomer. Data is obtained with the procedure 
described in Fig. |6] For the analyzed master sequence, the 
results are comparable to those of two complementary strands 
of length N/2 (dotted lines). 



tion affects the melting temperature, and that the melt- 
ing temperature rises when a strand is elongated. 

Up to now, we have analyzed hybridization of two com- 
plementary strands of equal length. How is the stability 
of the hybridization complex affected if one of the strands 
is replaced by two oligomers of half the length? We an- 
alyze the master sequence CTACTAGGGGGG. Its left 
half is similar to the first sequence of Fig. [5] with respect 
to non-identical neighboring bases. The right half has 
been chosen for its strong hybridization tendency. We 
run experiments as before and measure the hybridization 
of the left oligomer. By comparing its equilibrium rate 
to the one for two templates of half the length, we can 
determine how the dangling right hand side affects the 
equilibrium rate (e.g., we compare the hybridization of a 
4-mer to an 8-mer template to the hybridization of two 
4-mers.) Fig. [7] shows that the hybridization fractions 
Xo(N) and xt{N/2) are comparable for the analyzed se- 
quence. We expect, however, that xo decreases when the 
two oligomers have more interaction possibilities than in 
the selected master sequence. 



D. Effective replication rate 



We can roughly equate 



1-A 



[X] 



[X] total total 

and obtain an estimate for the equilibrium constant 
[^2] _ X 1 



[Xp-2(l-x)2 [X] 



total 



(28) 



from the measurements. This equation has to be taken 
with some caution because the measured hybridization 
times reflect a non-trivial relation between diffusing re- 
actants and rehybridization of partly molten complexes - 
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both scaling difTerently with concentration. To truly ob- 
tain K, one is advised to repeat the simulations with 
varying concentrations, i.e. box size. By means of 
Eq. (|28p . we convert the melting data from Sec. IIV CI 
to obtain hybridization energy changes 



-fceTlogXT 



2(1 

+ fcBTl0g[X]total. 

In the latter equation, —fee log [X] total denotes the trans- 
lational entropy for a box of size [^]^tai' while AS'init ac- 
counts for the configurational entropy of a single strand. 
We combine both entropy terms, AS" = AS'init + 
fcBlog[X]-^t^i, and fit 



iVAGba 



Ai/i„it - TAS' 



-fceTlog 



X 



2(1 



which allows for better comparison to the melting tem- 
perature plots, as AGt = -4=^ X = 1/2- 

Determining hybridization energies is difficult because 
hybridization is very stable at low temperatures, par- 
ticularly for long XNA strands. Dehybridization then 
becomes a rare event, which requires unfeasibly long 
simulation runs in order to sample equilibrium distribu- 
tions. Consequently, data for low temperatures and long 
strands is overshadowed by noise and we have excluded 
such data from the analysis. For the regime that is acces- 
sible to simulation. Fig. [5] shows the measured hybridiza- 
tion energies fitted to the theoretical model of Eq. (IT4|) - 
see figure caption for details. The data follows the linear 
trend of the model and can recover the proper temper- 
ature scaling. However, we also observe deviations from 
the analytical prediction for T > 1.9. The plot confirms 
that hybridization energy changes are close to zero at the 
melting temperature of each double strand. For the sim- 
ulation, where [Xjtotai = 0.001, we obtain ASinit = 1.33, 
which confirms that AGinit is primarily entropic. 

Unfortunately, the removal of noisy simulation results 
implies that we do not have measurements for the regime 
where the analytical model predicts the most features. 
To nevertheless obtain estimates for these points, we 
perform the same simulations as before but start with 
a perfectly hybridized complex. For short strands, the 
difference in initial conditions is negligible, as hybridiza- 
tion and dehybridization equilibrate within the simulated 
time span. For long strands / strands at low tempera- 
tures, the sampled x values progressively overestimate 
the equilibrium time fraction of hybridization. 

The rational behind these dehybridization measure- 
ments is the following: for long strands and low tempera- 
tures dehybridization of the ligation product becomes the 
rate limiting step and we are in the regime of Eq. (I2ip . 
Here, the effective replication rate is primarily governed 
by the rate of product dehybridization which in turn gives 
us an upper bound for the replication rate. By combin- 
ing the results of the two scenarios, we implicitly relate 
the simulated time span with an assumed ligation rate. 




10 

strand length 

FIG. 8. Hybridization energy changes AGt obtained from 
the measurements of section IIV CI sequence ACGCATACG- 
CATACGCATAC (symbols), fitted to the analytical model 
of equation (|14l) via the parameters Affbaae = —1.81, AS — 
-0.756, AHinit = 0.470, and AS" = -5.58. Since [XJtotai = 
0.001, we can estimate ASinit = 1.33. 
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FIG. 9. 



Effective equilibrium constant Kq / V^Kt, obtained 



from the measurements of Fig. [5] (colored) compared to the 
theoretical prediction of Eq. (|14|) (mesh). Data shaded in gray 
is extrapolated from dehybridization experiments. 



Plugging the measured constants Kt and Kq into 
Eq. (jlOl) , we obtain a fitness landscape for minimal repli- 
cators which is depicted in Fig. [S] The colored surface 
shows the effective equilibrium constant Kq/^/2Kt, ob- 
tained from the original measurements. The grey surface 
shows the results from the dehybridization experiments. 
Finally, the fitness landscape obtained via Eq. ([H)) is 
shown as a mash. For the analyzed master sequence and 
range of observation, the effective oligomer complex con- 
centration Kq/^/2K^ varies over 13 orders of magnitude 
with highest rates for long strands (A^ > 8) and low tem- 
peratures {kBT < 0.8). Contrary to the analytical deriva- 
tions, the numerical results of the dehybridization exper- 
iments indicate a saturation and possibly a decrease of 
the replication rate for long strands at low temperatures, 
thereby supporting our hypothesis that the effective rate 
indeed possesses an optimum when dehybridization and 
ligation occurr on comparable time scales. 
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FIG. 10. Final replication rate A; as a function of template 
length and temperature. The figure is produced by super- 
posing the data from Fig. |9] with the Arrhenius equation for 
the hgation reaction following Eq. (fT5|l with A = 10^, AHI = 
6.52fcBT'. For this parametrization, the critical strand length 
A''* above which the temperature dependence of the reaction 
inverts is 8. 

The numerical simulations do not incorporate the liga- 
tion reaction. To include its temperature dependence, we 
superpose the Arrhenius equation (|12p with parameters 
as in Fig. [2] onto Fig. ^ and obtain the replication rate 
landscape shown in Fig. [TU] The resulting figure features 
a critical strand length N* « 8 at which the tempera- 
ture scaling inverts. With the parameters obtained from 
the data fits, the critical temperature T* w 2.37 Tq lies 
outside the analyzed area. 



V. DISCUSSION 

The common strategy to increase the yield in template 
directed replication experiments is to increase the con- 
centration of oligomers. This is certainly viable, and the 
fact that the growth rate k is proportional to the square 
of the oligomer concentration encourages this approach. 
Our investigation, however, indicates that oligomer con- 
centration can be outweighed drastically by factors such 
as temperature, template length, as well as sequence in- 
formation, which all influence the replication rate at least 
exponentially and thus over many orders of magnitude. 
These finding are consistent for the simple analytical ex- 
pressions (Fig. [5]) , for the simulations (Fig. as well as 
for the combined analysis (Figs. Inland [TU]). 

Perhaps contrary to intuition, we find the highest 
growth rates for long replicators and low temperatures. 
This finding can be explained by the fact that the effec- 
tive growth rate of minimal replicators features a critical 
strand length N* at which the temperature dependence 
of the overall replication reaction inverts: below N* the 
replication rate is dominated by the ligation reaction and 
its positive temperature scaling, whereas above N*, the 



negative temperature scaling of the hybridization reac- 
tions becomes dominant, recall Fig. [TUl 

We observe that hybridization rates are highly se- 
quence dependent. In particular, our spatially resolved 
simulations reveal that adjacent identical nucleobases can 
drastically stabilize the hybridization complex. We ex- 
pect that the overall replication process is primarily se- 
quence specific near to the ligation sites, as it is known 
that mismatches near the ligation site effect the ligation 
the most 2§]. 

We also find that there is no difference in the repli- 
cation rates of symmetric versus asymmetric replicators. 
We see that from equation 18, where only the sum of 
the oligomer lengths appear:while the longer oligomer 
of an asymmetric replicator has a high binding affin- 
ity to the template and therefore promotes the forma- 
tion of a hybridization complex, the short oligomer has 
a smaller binding affinity, such that the total asymmet- 
ric hybridization complex is as stable as its symmetric 
counterpart. 

We emphasize that our approach hinges on the as- 
sumption that ligation is the rate limiting step of the 
replication reaction. Due to the temperature scaling of 
the diffusion, hybridization, and ligation processes, our 
approach breaks down for very low temperatures or very 
long template strands. As discussed in Section [3J equa- 
tion ((2T|) . for long strands and low temperatures the de- 
hybridization of the templates becomes the rate limit- 
ing step. Exactly what happens in the transition region 
between these two limits requires a more detailed non- 
equilibrium analysis and is outside the scope of this in- 
vestigation. The grey shaded area in Figs. [S] and [TU] de- 
picts the expected landscape for the replication rate as we 
approach this transition zone from the regime where lig- 
ation is rate limiting, and it is clearly seen how the repli- 
cation rate levels off as temperature decreases and the 
sequence length increases. In any event, we would expect 
the existence of a true optimal temperature for a given 
strand length, and equally a true optimal strand length 
for a given temperature, such that replication rates are 
maximized. 

In the context of origin of life research, where the tem- 
perature is given by the environment, but nucleic acid 
strands are subject to mutations, our findings suggest 
the existence of a critical temperature T*, below which 
evolution would select longer nucleic acid replicators that 
could have resulted from mutational elongations, thereby 
promoting an increase of the potential information stor- 
age associated with these molecules. Thus, these results 
shed new light on the "Snowball Earth" hypothesis. This 
argument, however, assumes that both templates and 
oligomers elongate to maintain the structure of a mini- 
mal replicator that replicates by ligation of two oligomers 
only. Needless to say, long oligomers of a specific se- 
quence are less frequent in a random pool of substrate, 
such that they do not necessarily benefit from their in- 
creased stability. 

Most importantly, in the context of minimal replica- 
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tor experiments and applications, e.g. in protocell as 
well as molecular computing and fabrication research, our 
findings suggest a qualitative recipe for obtaining high 
replication yields, as they relate experimentally accessi- 
ble data such as melting temperatures and ligation rate 
to the critical strand length (Eq. [T71) and temperature 
(Eq.IHl). 
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